Skip to main content

Prerequisites

The EDL Pipeline requires a Unix-like environment (Linux, macOS, or WSL on Windows) with Python 3.7+.
Windows Users: Use WSL (Windows Subsystem for Linux) or Git Bash. Native Windows Command Prompt may have issues with curl commands and path handling.

System Requirements

Python Version

Python 3.7 or higher (tested on 3.8-3.11)

Disk Space

Minimum 500 MB free (2 GB recommended for OHLCV data)

Network

Stable internet connection (pipeline fetches 30+ MB of data)

Memory

4 GB RAM minimum (8 GB recommended)

Installation Steps

1

Verify Python Installation

Check that Python 3 is installed:
python3 --version
Expected output:
Python 3.8.10
If Python is not installed, download from python.org or use your system’s package manager:
# macOS (Homebrew)
brew install python3

# Ubuntu/Debian
sudo apt update && sudo apt install python3 python3-pip

# Fedora/RHEL
sudo dnf install python3 python3-pip
2

Install Python Dependencies

The pipeline requires three core Python packages:
pip3 install requests pandas beautifulsoup4
Or use a requirements file:
requirements.txt
requests>=2.28.0
pandas>=1.5.0
beautifulsoup4>=4.11.0
pip3 install -r requirements.txt
PackageVersionPurpose
requests>=2.28.0HTTP client for API calls to Dhan, NSE endpoints
pandas>=1.5.0OHLCV data processing, CSV parsing (NSE listings)
beautifulsoup4>=4.11.0HTML parsing for surveillance lists (Google Sheets fallback)
3

Verify Installation

Confirm all dependencies are installed:
python3 -c "import requests, pandas, bs4; print('All dependencies OK')"
Expected output:
All dependencies OK
4

Locate the Pipeline Directory

Navigate to the EDL Pipeline source code:
cd "~/workspace/source/DO NOT DELETE EDL PIPELINE"
Verify the master runner script exists:
ls -l run_full_pipeline.py
DO NOT DELETE or RENAME this directory. The folder name is intentionally explicit to prevent accidental removal. All pipeline scripts use relative paths and expect to run from this directory.
5

Verify Directory Structure

The pipeline directory should contain these core scripts:
ls -1 *.py
Expected output (18 core scripts):
run_full_pipeline.py              # Master runner
fetch_dhan_data.py                # Phase 1: Core data
fetch_fundamental_data.py         # Phase 1: Fundamentals
fetch_company_filings.py          # Phase 2: Filings
fetch_new_announcements.py        # Phase 2: Announcements
fetch_advanced_indicators.py      # Phase 2: Indicators
fetch_market_news.py              # Phase 2: News
fetch_corporate_actions.py        # Phase 2: Corporate actions
fetch_surveillance_lists.py       # Phase 2: ASM/GSM
fetch_circuit_stocks.py           # Phase 2: Circuits
fetch_bulk_block_deals.py         # Phase 2: Bulk deals
fetch_incremental_price_bands.py  # Phase 2: Price bands
fetch_complete_price_bands.py     # Phase 2: Price bands
fetch_all_ohlcv.py                # Phase 2.5: OHLCV
bulk_market_analyzer.py           # Phase 3: Base JSON
advanced_metrics_processor.py     # Phase 4: Metrics
process_earnings_performance.py   # Phase 4: Earnings
enrich_fno_data.py                # Phase 4: F&O data
add_corporate_events.py           # Phase 4: Events (LAST)
These scripts are NOT part of the main pipeline but can be run manually:
fetch_all_indices.py          # 194 market indices
fetch_etf_data.py             # 361 ETFs
fetch_fno_data.py             # 207 F&O stocks
fetch_fno_lot_sizes.py        # F&O lot sizes
fetch_fno_expiry.py           # Expiry calendar
single_stock_analyzer.py      # Single stock inspector
pipeline_utils.py             # Shared utilities
6

Test Run (Dry Run)

Verify the pipeline can start without errors:
python3 -c "import run_full_pipeline; print('Pipeline module loaded successfully')"
Or run a quick test with a single script:
python3 fetch_dhan_data.py
This should create two files:
  • dhan_data_response.json (~5 MB)
  • master_isin_map.json (~200 KB)
Verify:
ls -lh dhan_data_response.json master_isin_map.json

Directory Structure After First Run

After running the pipeline once, your directory will look like this:
DO NOT DELETE EDL PIPELINE/
├── run_full_pipeline.py
├── fetch_*.py (18 scripts)
├── all_stocks_fundamental_analysis.json.gz  # PRIMARY OUTPUT (2-4 MB)
└── ohlcv_data/                              # OHLCV cache (if FETCH_OHLCV = True)
    ├── RELIANCE.csv
    ├── TCS.csv
    └── ... (2,775 CSV files)
Recommended: Keep CLEANUP_INTERMEDIATE = True to save disk space. The compressed output contains all data needed for analysis.

Network Configuration

The pipeline makes HTTP requests to multiple endpoints:
EndpointPurposeRate Limit
ow-scanx-analytics.dhan.coFull market scan, corporate actionsThread pool: 1
open-web-scanx.dhan.coFundamental dataThread pool: 1
ow-static-scanx.dhan.coFilings, announcements, indicators, dealsThread pool: 15-50
news-live.dhan.coReal-time news feedThread pool: 15
openweb-ticks.dhan.coOHLCV historical dataThread pool: 15
nsearchives.nseindia.comListing dates, price bandsDirect curl
Google Sheets (fallback)Surveillance listsDirect requests
Firewall/Proxy Users: Ensure outbound HTTPS (port 443) is allowed for:
  • *.dhan.co
  • nsearchives.nseindia.com
  • docs.google.com (for surveillance list fallback)

Validation Checklist

Before running the full pipeline, verify:
1

Python Dependencies

python3 -c "import requests, pandas, bs4; print('✅ All dependencies OK')"
2

Network Connectivity

curl -s -o /dev/null -w "%{http_code}" https://ow-scanx-analytics.dhan.co
Expected: 200 or 405 (endpoint exists)
3

Disk Space

df -h . | tail -1 | awk '{print $4 " available"}'
Ensure at least 500 MB free (2 GB if using OHLCV)
4

Write Permissions

touch test.json && rm test.json && echo "✅ Write permission OK"

Troubleshooting Installation

Cause: Dependencies not installed in the correct Python environment.Solution:
# Ensure you're using the same python3 binary
which python3

# Install with explicit python3 pip
python3 -m pip install requests pandas beautifulsoup4

# Verify installation
python3 -m pip list | grep -E '(requests|pandas|beautifulsoup4)'
Cause: Scripts lack execute permissions.Solution:
# Make scripts executable
chmod +x *.py

# Or run with python3 explicitly
python3 run_full_pipeline.py
Cause: curl not installed.Solution:
# macOS: curl is pre-installed
# Ubuntu/Debian:
sudo apt install curl

# Fedora/RHEL:
sudo dnf install curl

# Verify:
curl --version
Impact if not fixed: Listing dates will be missing, but pipeline will continue (non-critical).
Cause: Corporate proxy or outdated CA certificates.Solution:
# Update CA certificates
# Ubuntu/Debian:
sudo apt update && sudo apt install ca-certificates

# macOS:
/Applications/Python\ 3.X/Install\ Certificates.command

# Or temporarily disable SSL verification (NOT RECOMMENDED for production):
# Add to fetch scripts:
# response = requests.post(url, json=payload, headers=headers, verify=False)
Cause: Slow network or rate limiting.Solution:
# First run: Expect 30-40 min for lifetime OHLCV download
# If timing out repeatedly, increase timeout in fetch_all_ohlcv.py (line ~50):
# timeout=30 → timeout=60

# Or skip OHLCV for faster pipeline:
# Edit run_full_pipeline.py:
FETCH_OHLCV = False
To isolate dependencies from system Python:
1

Create Virtual Environment

python3 -m venv edl-env
2

Activate Environment

# Linux/macOS:
source edl-env/bin/activate

# Windows (WSL):
source edl-env/bin/activate
Your prompt should now show (edl-env).
3

Install Dependencies

pip install requests pandas beautifulsoup4
4

Run Pipeline

python run_full_pipeline.py
5

Deactivate (when done)

deactivate

Next Steps

Quick Start Guide

Run your first pipeline and explore the output

Pipeline Settings

Customize pipeline behavior

Pipeline Architecture

Understand the pipeline phases

Field Reference

Complete guide to all 86 output fields